Goto

Collaborating Authors

 head model


A Controllable 3D Deepfake Generation Framework with Gaussian Splatting

Liu, Wending, Liang, Siyun, Nguyen, Huy H., Echizen, Isao

arXiv.org Artificial Intelligence

W e propose a novel 3D deepfake generation framework based on 3D Gaussian Splatting that enables realistic, identity-preserving face swapping and reenactment in a fully controllable 3D space. Compared to conventional 2D deepfake approaches that suffer from geometric inconsistencies and limited generalization to novel view, our method combines a parametric head model with dynamic Gaussian representations to support multi-view consistent rendering, precise expression control, and seamless background integration. T o address editing challenges in point-based representations, we explicitly separate the head and background Gaussians and use pre-trained 2D guidance to optimize the facial region across views. W e further introduce a repair module to enhance visual consistency under extreme poses and expressions. Experiments on NeRSem-ble and additional evaluation videos demonstrate that our method achieves comparable performance to state-of-the-art 2D approaches in identity preservation, as well as pose and expression consistency, while significantly outperforming them in multi-view rendering quality and 3D consistency. Our approach bridges the gap between 3D modeling and deepfake synthesis, enabling new directions for scene-aware, controllable, and immersive visual forgeries, revealing the threat that emerging 3D Gaussian Splatting technique could be used for manipulation attacks.


Simulating Safe Bite Transfer in Robot-Assisted Feeding with a Soft Head and Articulated Jaw

San, Yi Heng, Ravichandram, Vasanthamaran, Yow, J-Anne, Chan, Sherwin Stephen, Wang, Yifan, Ang, Wei Tech

arXiv.org Artificial Intelligence

--Ensuring safe and comfortable bite transfer during robot-assisted feeding is challenging due to the close physical human-robot interaction required. This paper presents a novel approach to modeling physical human-robot interaction in a physics-based simulator (MuJoCo) using soft-body dynamics. We integrate a flexible head model with a rigid skeleton while accounting for internal dynamics, enabling the flexible model to be actuated by the skeleton. Incorporating realistic soft-skin contact dynamics in simulation allows for systematically evaluating bite transfer parameters, such as insertion depth and entry angle, and their impact on user safety and comfort. Our findings suggest that a straight-in-straight-out strategy minimizes forces and enhances user comfort in robot-assisted feeding, assuming a static head. This simulation-based approach offers a safer and more controlled alternative to real-world experimentation. Supplementary videos can be found at: https://tinyurl.com/224yh2kx.

  Country:
  Genre: Research Report > New Finding (0.86)
  Industry: Health & Medicine (0.68)

Graph convolutional networks enable fast hemorrhagic stroke monitoring with electrical impedance tomography

Toivanen, J., Kolehmainen, V., Paldanius, A., Hänninen, A., Hauptmann, A., Hamilton, S. J.

arXiv.org Artificial Intelligence

Objective: To develop a fast image reconstruction method for stroke monitoring with electrical impedance tomography with image quality comparable to computationally expensive nonlinear model-based methods. Methods: A post-processing approach with graph convolutional networks is employed. Utilizing the flexibility of the graph setting, a graph U-net is trained on linear difference reconstructions from 2D simulated stroke data and applied to fully 3D images from realistic simulated and experimental data. An additional network, trained on 3D vs. 2D images, is also considered for comparison. Results: Post-processing the linear difference reconstructions through the graph U-net significantly improved the image quality, resulting in images comparable to, or better than, the time-intensive nonlinear reconstruction method (a few minutes vs. several hours). Conclusion: Pairing a fast reconstruction method, such as linear difference imaging, with post-processing through a graph U-net provided significant improvements, at a negligible computational cost. Training in the graph framework vs classic pixel-based setting (CNN) allowed the ability to train on 2D cross-sectional images and process 3D volumes providing a nearly 50x savings in data simulation costs with no noticeable loss in quality. Significance: The proposed approach of post-processing a linear difference reconstruction with the graph U-net could be a feasible approach for on-line monitoring of hemorrhagic stroke.


Advancing fNIRS Neuroimaging through Synthetic Data Generation and Machine Learning Applications

Waks, Eitan

arXiv.org Machine Learning

This study presents an integrated approach for advancing functional Near-Infrared Spectroscopy (fNIRS) neuroimaging through the synthesis of data and application of machine learning models. By addressing the scarcity of high-quality neuroimaging datasets, this work harnesses Monte Carlo simulations and parametric head models to generate a comprehensive synthetic dataset, reflecting a wide spectrum of conditions. We developed a containerized environment employing Docker and Xarray for standardized and reproducible data analysis, facilitating meaningful comparisons across different signal processing modalities. Additionally, a cloud-based infrastructure is established for scalable data generation and processing, enhancing the accessibility and quality of neuroimaging data. The combination of synthetic data generation with machine learning techniques holds promise for improving the accuracy, efficiency, and applicability of fNIRS tomography, potentially revolutionizing diagnostics and treatment strategies for neurological conditions. The methodologies and infrastructure developed herein set new standards in data simulation and analysis, paving the way for future research in neuroimaging and the broader biomedical engineering field.


A Robust eLORETA Technique for Localization of Brain Sources in the Presence of Forward Model Uncertainties

Noroozi, A., Ravan, M., Razavi, B., Fisher, R. S., Law, Y., Hasan, M. S.

arXiv.org Artificial Intelligence

In this paper, we present a robust version of the well-known exact low-resolution electromagnetic tomography (eLORETA) technique, named ReLORETA, to localize brain sources in the presence of different forward model uncertainties. Methods: We first assume that the true lead field matrix is a transformation of the existing lead field matrix distorted by uncertainties and propose an iterative approach to estimate this transformation accurately. Major sources of the forward model uncertainties, including differences in geometry, conductivity, and source space resolution between the real and simulated head models, and misaligned electrode positions, are then simulated to test the proposed method. Results: ReLORETA and eLORETA are applied to simulated focal sources in different regions of the brain and the presence of various noise levels as well as real data from a patient with focal epilepsy. The results show that ReLORETA is considerably more robust and accurate than eLORETA in all cases. Conclusion: Having successfully dealt with the forward model uncertainties, ReLORETA proved to be a promising method for real-world clinical applications. Significance: eLORETA is one of the localization techniques that could be used to study brain activity for medical applications such as determining the epileptogenic zone in patients with medically refractory epilepsy. However, the major limitation of eLORETA is sensitivity to the uncertainties in the forward model. Since this problem can substantially undermine its performance in real-world applications where the exact lead field matrix is unknown, developing a more robust method capable of dealing with these uncertainties is of significant interest.


SplitBeam: Effective and Efficient Beamforming in Wi-Fi Networks Through Split Computing

Bahadori, Niloofar, Matsubara, Yoshitomo, Levorato, Marco, Restuccia, Francesco

arXiv.org Artificial Intelligence

Modern IEEE 802.11 (Wi-Fi) networks extensively rely on multiple-input multiple-output (MIMO) to significantly improve throughput. To correctly beamform MIMO transmissions, the access point needs to frequently acquire a beamforming matrix (BM) from each connected station. However, the size of the matrix grows with the number of antennas and subcarriers, resulting in an increasing amount of airtime overhead and computational load at the station. Conventional approaches come with either excessive computational load or loss of beamforming precision. For this reason, we propose SplitBeam, a new framework where we train a split deep neural network (DNN) to directly output the BM given the channel state information (CSI) matrix as input. We formulate and solve a bottleneck optimization problem (BOP) to keep computation, airtime overhead, and bit error rate (BER) below application requirements. We perform extensive experimental CSI collection with off-the-shelf Wi-Fi devices in two distinct environments and compare the performance of SplitBeam with the standard IEEE 802.11 algorithm for BM feedback and the state-of-the-art DNN-based approach LB-SciFi. Our experimental results show that SplitBeam reduces the beamforming feedback size and computational complexity by respectively up to 81% and 84% while maintaining BER within about 10^-3 of existing approaches. We also implement the SplitBeam DNNs on FPGA hardware to estimate the end-to-end BM reporting delay, and show that the latter is less than 10 milliseconds in the most complex scenario, which is the target channel sounding frequency in realistic multi-user MIMO scenarios.


Unsupervised Learning of Style-Aware Facial Animation from Real Acting Performances

Paier, Wolfgang, Hilsmann, Anna, Eisert, Peter

arXiv.org Artificial Intelligence

This paper presents a novel approach for text/speech-driven animation of a photo-realistic head model based on blend-shape geometry, dynamic textures, and neural rendering. Training a VAE for geometry and texture yields a parametric model for accurate capturing and realistic synthesis of facial expressions from a latent feature vector. Our animation method is based on a conditional CNN that transforms text or speech into a sequence of animation parameters. In contrast to previous approaches, our animation model learns disentangling/synthesizing different acting-styles in an unsupervised manner, requiring only phonetic labels that describe the content of training sequences. For realistic real-time rendering, we train a U-Net that refines rasterization-based renderings by computing improved pixel colors and a foreground matte. We compare our framework qualitatively/quantitatively against recent methods for head modeling as well as facial animation and evaluate the perceived rendering/animation quality in a user-study, which indicates large improvements compared to state-of-the-art approaches


Development of an automatic 3D human head scanning-printing system

Zhang, Longyu, Han, Bote, Dong, Haiwei, Saddik, Abdulmotaleb El

arXiv.org Artificial Intelligence

In anthropological studies, researchers have been investigating the relationship between facial shape variations and neurological and psychiatric disorders. For example, Hennesy et al. used 3D head models acquired from laser scanners to identify schizophrenia from facial dysmorphic features [3]. A fast algorithm for 3D face reconstruction with uncalibrated photometric stereo technology was also proposed by Qi et al. [4]. Human avatar animation has also become popular with the development of 3D graphics and gaming. Lee and Magnenat-Thalman introduced a method to reconstruct 3D facial models for animation from two orthogonal images (frontal and profile view) or from range data [5]. Additionally, Kan and Ferko adopted this same principle to build an automatic system where they use the facial feature matching of two images and a parametrized head model to create 3D head models as avatars in 3D games [6]. An important part of 3D human model is head model, which can be used to establish standards for the design of products that fit onto the face or head, such as respiratory masks, glasses, helmets or other head-mounted devices [7]. An interesting initiative was the Size-China project [8,9]. To find the proper fit for Asians, who have different head shapes compared with Westerners in facialhead products such as helmets, face masks, and caps, and to derive standards with anthropometric database, Ball et al. created an Asian anthropometric database built from 3D scans of 2000 Asian people using a stationary head and face color 3D scanner by Cyberware


Data-driven Uncertainty Quantification in Computational Human Head Models

Upadhyay, Kshitiz, Giovanis, Dimitris G., Alshareef, Ahmed, Knutsen, Andrew K., Johnson, Curtis L., Carass, Aaron, Bayly, Philip V., Shields, Michael D., Ramesh, K. T.

arXiv.org Machine Learning

Computational models of the human head are promising tools for estimating the impact-induced response of brain, and thus play an important role in the prediction of traumatic brain injury. Modern biofidelic head model simulations are associated with very high computational cost, and high-dimensional inputs and outputs, which limits the applicability of traditional uncertainty quantification (UQ) methods on these systems. In this study, a two-stage, data-driven manifold learning-based framework is proposed for UQ of computational head models. This framework is demonstrated on a 2D subject-specific head model, where the goal is to quantify uncertainty in the simulated strain fields (i.e., output), given variability in the material properties of different brain substructures (i.e., input). In the first stage, a data-driven method based on multi-dimensional Gaussian kernel-density estimation and diffusion maps is used to generate realizations of the input random vector directly from the available data. Computational simulations of a small number of realizations provide input-output pairs for training data-driven surrogate models in the second stage. The surrogate models employ nonlinear dimensionality reduction using Grassmannian diffusion maps, Gaussian process regression to create a low-cost mapping between the input random vector and the reduced solution space, and geometric harmonics models for mapping between the reduced space and the Grassmann manifold. It is demonstrated that the surrogate models provide highly accurate approximations of the computational model while significantly reducing the computational cost. Monte Carlo simulations of the surrogate models are used for uncertainty propagation. UQ of strain fields highlight significant spatial variation in model uncertainty, and reveal key differences in uncertainty among commonly used strain-based brain injury predictor variables.


On-device Federated Learning with Flower

Mathur, Akhil, Beutel, Daniel J., de Gusmão, Pedro Porto Buarque, Fernandez-Marques, Javier, Topal, Taner, Qiu, Xinchi, Parcollet, Titouan, Gao, Yan, Lane, Nicholas D.

arXiv.org Artificial Intelligence

Federated Learning (FL) allows edge devices to collaboratively learn a shared prediction model while keeping their training data on the device, thereby decoupling the ability to do machine learning from the need to store data in the cloud. Despite the algorithmic advancements in FL, the support for on-device training of FL algorithms on edge devices remains poor. In this paper, we present an exploration of on-device FL on various smartphones and embedded devices using the Flower framework. We also evaluate the system costs of on-device FL and discuss how this quantification could be used to design more efficient FL algorithms.